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ABSTRACT 

Subhalo abundance matching (SHAM) is a widely-used method to connect galaxies with 
dark matter structures in numerical simulations. SHAM predictions agree remarkably well 
with observations, yet they still lack strong theoretical support. We examine the perfor¬ 
mance, implementation, and assumptions of SHAM using the EAGLE project simulations. 
We find that Vreiax, the highest value of the circular velocity attained by a subhalo while it 
satisfies a relaxation criterion, is the subhalo property that correlates most strongly with 
galaxy stellar mass (Mgtar)- Using this parameter in SHAM, we retrieve the real-space 
clustering of EAGLE to within our statistical uncertainties on scales greater than 2 Mpc 
for galaxies with 8.77 < log 2 o(A 7 star[M 0 ]) < 10.77. Gonversely, clustering is overestimated 
by 30% on scales below 2 Mpc for galaxies with 8.77 < logiQ(Mstar[M 0 ]) < 9.77 because 
SHAM slightly overpredicts the fraction of satellites in massive haloes compared to EA¬ 
GLE. The agreement is even better in redshift-space, where the clustering is recovered 
to within our statistical uncertainties for all masses and separations. Additionally, we 
analyse the dependence of galaxy clustering on properties other than halo mass, i.e. the 
assembly bias. We demonstrate assembly bias alters the clustering in EAGLE by 20 % 
and Vreiax captures its effect to within 15 %. We trace small differences in the clustering 
to the failure of SHAM as typically implemented, i.e. the Mgtar assigned to a subhalo does 
not depend on i) its host halo mass, ii) whether it is a central or a satellite. In EAGLE 
we find that these assumptions are not completely satisfied. 

Key words: large-scale structure of the Universe - dark matter - galaxies: haloes - 
galaxies: formation - galaxies: evolution 


1 INTRODUCTION 

The clustering of galaxies offers an excellent window to ex¬ 
plore galaxy formation processes and the fundamental prop¬ 
erties of our Universe. On small scales, correlation functions 
can inform us about the way in which galaxies populate dark 
matter (DM) haloes and thus about the efficiency of star for¬ 
mation and the importance of environmental effects. On large 
scales, the clustering of galaxies can be used to constrain cos¬ 
mological parameters and the law of gravity. On even larger 
scales, the observed distribution of galaxies is sensitive to the 
physics of inflation and relativistic effects. By using correlation 
functions of different orders and at distinct scales, degenera¬ 
cies among several parameters can be broken, providing even 
tighter constrains on all the aforementioned quantities. 

To extract the information encoded in the clustering of 
galaxies, we need accurate predictions for a given cosmologi¬ 
cal scenario and galaxy formation model. However, obtaining 
the correct galaxy distribution is a difficult task, especially at 


small scales where besides highly non-linear dynamics, gravi¬ 
tational collapse, mergers, dynamical friction, and tidal strip¬ 
ping; baryonic processes such as star formation, feedback, and 
ram pressure are at play. Consequently, one needs to resort 
to numerical simulati ons to obtain accu rate predictions for 
galaxy clustering fsee lKuhlen et aLll2012l . for a review). 

Two types of approach can be followed. The first is to 
simulate the joint evolution of DM and baryons by solving the 
Poisson and Euler equations coupled with recipes for unre¬ 
solved physical processes (e.g. star and black hole formation). 
Although this approach currently yields the most direct pre¬ 
dictions for the distribution of galaxies, it is computationally 
infeasible to simulate large cosmological volumes with ade¬ 
quate resolution for calculating accurately the galaxy cluster¬ 
ing on scales on the order of 100 h~^ Mpc. In addition, simu¬ 
lations have onl y recently begun to produce populations of r e- 
alistic galaxies dVogelsberger et Tll20l4 ISchaye et al.ll2015l 'l. 

The second approach is to simulate only grayitational in- 
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teractions and to predict the galaxy clustering a posteriori. 
This is justified by leading theories of galaxy formation, where 
DM plays the dominant role in determining the places where 
galaxies form and merge. Gravity-only simulations (a.k.a. DM- 
only simulations) are computationally less expensive and can 
thus follow sufficiently large volumes to enable the correct in¬ 
terpretation of observational surveys. This is an important 
advantage since, for instance, to model galaxy clustering on 
scales beyond 100 h~^ Mpc, it is necessary to pe rform A^-body 
simu lations of volumes in excess of 1 h~^ Gpc® (lAngulo et al.l 
l2008l 'l. The disadvantage is that the predictions for galaxy clus¬ 
tering are more uncertain because the relation between galax¬ 
ies and DM haloes is not straightforward. 

Subhalo abund an ce matching _ (S HAM, e.g., 

Va^ fc Ostrikerj l2004l : IShankar et al.l l 200 d : iGonrov et al'l 


is a widely-used method to populate gravity-only simu¬ 
lations with galaxies. The original version of SHAM assumes 
an injective and monotonic relation between galaxies and self¬ 
bound DM structures based on a set of specified properties. 
SHAM usually links galaxies to DM structures using stellar 
mass as galaxy property and a measure of subhalo mass, 
such as circular velocity, as subhalo property. More recent 
implementations introduce stochasticit y into the relat i on to 
make the model more realistic (e.g., Behroozi et al.| '20ld: 
Tnfi illo-Gomez et aI.ll201ll : lReddick et ahl 201^nZentneret*al] 


2014h . Then, SHAM places each galaxy at the centre-of- 


potential of its corresponding subhalo and assumes that 
each galaxy has the same velocity as the centre-of-mass of 
its linked subhalo. SHAM thus makes predictions for the 
clustering of galaxies, but not for any physical properties such 
as stellar mass, star formation rate, metallicity, etc. 


markably well with observations (e.g. 

Gonrov et al. 

2006: 

Guo et al.l 201(ll; Wetzel & White! 201(11; 

Moster et al. 

2011 : 

Behroozi et al.l2010l:lTruiillo-Gomez et al 

l 201 ll:IWatson et al. 

2012 : Nuza et al.l 20131: iReddick et al. 2013IL For ins 

tance, 
le ob- 

Gonrov et al.l ll200fJl showed that SHAM reproduces t 


cut-off in the fraction of satellite galaxies) that provide enough 
freedom to become insensitive to them. The importance of the 
information being decoded, added to the fact that the amount 
and accuracy of clustering data will increase dramatically over 
the next decade due to the emergence of wide-field galaxy sur¬ 
veys (e.g. DES, HETDEX, eBOSS, JPAS, DESI, EUCLID, 
and LSST), makes it crucial to critically test the assumptions 
underlying SHAM. 

In this paper we will empl oy the state-of-the-art hydrody- 
nami cal simulations EAGLE llSchave et al.llioiRl : ICrain et al.l 
l2015l f to study the SHAM technique in detail. Our objectives 
are threefold, i) to seek the most accurate implementation of 
SHAM, ii) to directly test the underlying assumptions, and iii) 
to assert how accurately SHAM can predict galaxy clustering. 

We will propose Keiax, defined as the maximum of the 
circular velocity of a DM structure along its entire history 
while it fulfils a relaxation criterion, as the best subhalo prop¬ 
erty with which to perform SHAM. We will show that this 
definition captures the best qualities of previously proposed 
implementations while mitigating their disadvantages and re¬ 
ducing the number of problematic cases. As a consequence, 
Ureiax sliows the Strongest correlation with the simulated stel¬ 
lar mass of EAGLE galaxies. 

We will show that SHAM is able to reproduce the cluster¬ 
ing properties of stellar mass selected galaxies in the EAGLE 
simulation (which successfully reproduces many properties of 
observed low-z galaxies). For the stellar mass range investi¬ 
gated ( 10 ®’^^ < Matar[M 0 ] < 10 ^®’^^), the agreement is better 
than 10 % on scales greater than 2 Mpc, and better than 30 % 
on smaller scales. The agreement is particularly good for mas¬ 
sive galaxies and in redshift space, for which we do not find 
statistically significant difference between the clustering pre¬ 
dicted by SHAM and EAGLE. This is remarkable given that 
we explore almost two orders of magnitude in spatial scale and 
four in clustering amplitude. 

Additionally, we will pay attention to the so-called 


z < 5). More recentlv. iReddick et al.l ll2013l') achieved a si- 

haloes on properties other than mass (iGao et al. 

2005: 

multaneous fit to the clusteringanfi_&e conditional stellar 

Zhu et al. 200(1: Wechsler et al. l200fil: Groton et al. 

2007: 

mass function measured in SDSS. |Simha & Gole| (|2013|) even 

Gao & Whitell2007:IZu et al.ll2008l: iDalal et al.l 20081: iLi et al. 

used this model to constrain cosmological parameters, find- 
ing values in good agreement with those obtained from more 

20081: Lacerna & Padilla 2011). 

2 OI 2 I: Zentner et al.l 2014: 

Lacerna et al. 2014: Hearin et al. 

20141'). We will show that 


established methods. 

Despite these successes, the comparison with simula- 
tions of galaxy form ation has not been so encouraging. 

found that the galaxy clustering pre- 


IWeinberg et al 


dieted by SHAM only agrees with that of a hydrodynamical 
simulation beyond 1 h~^ Mpc. On smaller scales, the differ¬ 
ences were of the order of a few. ISimha et al.l ll 2012 l ~) extended 
the previous study using two hydrodynamic simulations with 
different feedback models. They found that the clustering pre¬ 
dicted by SHAM exceeded that of their most realistic simula¬ 
tion by more than a factor of 2 on scales below 0.5 h~^ Mpc. 
Finally, in a direct c omparison with t wo se mi-analytic models 
of galaxy formation. iGontreras et ahl (l2015l j found that SHAM 
performs well at some galaxy number densities, but not at oth¬ 
ers. 

It is therefore not clear whether SHAM is able to match 
the observed galaxy clustering because it makes accurate as¬ 
sumptions (i.e. the physical relation between subhaloes and 
galaxies) or because some implementations employ free param¬ 
eters (e.g. a scatter between subhalo and galaxy properties or a 


assembly bias is present in both EAGLE and SHAM galaxies, 
increasing the clustering amplitude by 20 % on scales from 2 to 
11 Mpc. To our knowledge, this is the first detection of assem¬ 
bly bias in a hydrodynamical simulation. This result supports 
the idea that Halo Occupation Distr i bution models (HOD, e.g ., 
ISeliakllioOfll : IPeacock fc Smitl3l200fll : [s coccimarro et al]l 200 lh . 
which are a phenomenological parametrization for the number 
of galaxies hosted by haloes of a given mass, introduce bias 
in the calculation of galaxy clustering when they assume that 
halo occupation is a function only of halo mass. 

Finally, we will track the small residual differences in the 
clustering of SHAM and EAGLE galaxies to the failure of a 
key assumption of SHAM (as commonly implemented): for 
the same Keiax, central and satellite subhaloes host the same 
galaxies independently of their host halo mass. We will find 
that this supposition is broken due to the influence of the en¬ 
vironment and the star formation that satellite galaxies expe¬ 
rience after having been accreted. Both effects correlate with 
the mass of the DM host, which suggests that future SHAM 
implementations that employ both host halo mass and Keiax 
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Table 1. EAGLE/DMO cosmological and numerical param- 
eters. The cosmological param eter values are taken from 
IPlanck Collaboration et alJ ll2ni4al [bh. 


Parameter 

EAGLE/DMO 


0.307 

Ha 

0.693 

Hb 

0.04825 

i7o[kms“^ Mpc“^] 

67.77 

0-8 

0.8288 

ris 

0.9611 

Max. proper softening [kpc] 

0.70 

Num. of baryonic particles 

15043 /- 

Num. of DM particles 

1504®/1504® 

Initial baryonic particle mass [IO^Mq] 

0.181/- 

DM particle mass [IO’^Mq] 

0.970/1.150 


Notes, flm, a-re the densities of matter, dark energy, 

and baryonic matter in units of the critical density at redshift zero. 
Hq is the present day Hubble expansion rate, cts is the linear fluctu¬ 
ation amplitude at 8 h~^ Mpc, and ria is the scalar spectral index. 


could yield even more accurate predictions for the clustering 
signal. 

Our paper is organized as follows. In §2 we describe the 
simulations, halo and galaxy catalogues, and merger trees that 
we use. In §3 we discuss different implementations of SHAM 
and introduce Keiax, a new proxy for stellar mass. In §4 we 
analyse the accuracy with which SHAM can predict the galaxy 
satellite fraction, host halo mass, clustering, and assembly 
bias. We discuss the limitations of SHAM in §5. We conclude 
and summarize our most important results in §6. 


2 NUMERICAL SIMULATIONS 

In this section we provide details of the main datasets that 
we employ. This includes a brief description of the numerical 
simulations, halo and galaxy catalogues, merger trees, and of 
a technique to identify the same structures in our hydrody- 
namical and gravity-only simulations. 


2.1 The EAGLE suite 


The simulations we analyse in this paper belong to the “Evolu¬ 
tion and Assembly of Galaxies and their Environ ment” project 
(EAGLE; ISchave et al.|[2015l : ICrain et al.ll201^ 1 conducted by 
the Virgo consortium. EAGLE is a suite of high-resolution hy- 
drodynamical simulations aimed at understanding the forma¬ 
tion of galaxies in a cosmolo gical volume. T he runs employed 
a pressure-entropy variant dHonkinsI l2013h of the Tree-PM 
smoo thed particle hydrodyna mics code GADGETS (ISpringell 
l2005l i , the time step limiters of iDurier fc Dalla Vecchial (l2012fi 
an d implement state-o f-the-art subgrid physics (as described 
by ISchave et al.ll2niRh . incl uding metal-dependent radiative 


dynamics Jwiersma et al.l 

2009b 1, gas accretion onto su- 

permassive black holes ( 

Rosas-Gueyara et al.l |2013|l. star 


(IPalla Vecchia fc Schavel 2012h . and AGN feedback. 


The EAGLE suite includes runs with different physical 
prescriptions, resolutions, and volumes. Here, we study the 
largest simulation, which follows 1504® gas particles and the 
same number of DM particles inside a periodic box with a side 


length of 100 Mpc. The large volume and high resolution of 
this simulation are essential for a careful analysis of SHAM. 
The cosmological parameters used in EAGLE are those pre¬ 
ferred by the analysis of Planck data (Table [TJ. This implies 
a gas particle mass equal to 1.81 x IO^Mq and a DM particle 
mass equal to 9.70 x 10® Mq. We highlight that EAGLE is well 
suited to this study because it was calibrated to reproduce the 
galaxy stellar mass function at z ~ 0. The agreement with ob¬ 
servations is especiall y good over the ma ss range that we will 
analyse here (fig. 4 of ISchave et al.l^2015^ . 

The 100 Mpc box was resimulated including only gravita¬ 
tional interactions and sampling the density field with 1504® 
particles of mass 1.15 x lO^M©. Hereafter, we refer to this 
simulation and its hydrodynamical counterpart as DMO and 
EAGLE, respectively. The cosmological and some of the nu¬ 
merical parameters employed in these simulations are provided 
in Table [T] 


2.2 Catalogues and mergers trees 

In each simulation, haloes were identified using only DM parti¬ 
cles and a standard friends-o f-friends (FOE) gro up-hnder with 
a linking parameter b = 0.2 llDavis et al.lIlQSal . Gas and star 
particles are assigned to the same FoF halo as their closest DM 
particle. For each FoF halo we compute a spherical-overdensity 
mass, M 200 , defined as the mass inside a sphere with mean 
density equal to 200 times the critical density of the Universe, 

Pcrit(z:); 


47r Q 

M 200 = ”^ 200 pcritr 2 OO, (1) 

where r 2 oo is the radius of the halo, Pcritiz) = , G is 

the gravitational constant, and H (z) is the value of the Hubble 
parameter H{z) = Hoy/0,inA + 2 )® + Ha- 

Self-bound structures inside FoF haloes, termed sub¬ 
haloes, were ide ntified using all particle types and th e SUB¬ 
FIND algorithm llSpringel et al.l200ll : lDolag et al.l2009l l. Here¬ 
after, we will refer to the subhalo located at the potential 
minimum of a giyen FoF halo as the “central”, to any other 
structures as “satellites”, and to subhaloes with more than 
one star particle as EAGLE “galaxies”. 

The position of each galaxy is assumed to be that of the 
particle situated at the minimum of the grayitational poten¬ 
tial of the respectiye subhalo. The galaxy yelocity is assumed 
to be that of the centre of mass of the subhalo [j. The stellar 
mass, Matar, is the total mass of all star particles linked to a 
giyen EAGLE galaxy. The gas mass (Mgas) and the dark mat¬ 
ter mass {Mum) are computed in the same manner but using 
gas particles or dark matter particles, respectiyely. We yeri- 
fied that our results are insensitiye to the exact dehnition of 
Mstai- we repeated our analysis defining ALtar as the mass in¬ 
side a sphere of 20, 30, 40, 50, 70, or 100 kpc radius. We found 
that different mass definitions only produces sub-percent dif¬ 
ferences in the galaxy clustering. 

We employ “merger trees” to follow the eyolution of 
haloes and subhaloes, their mass growth, tidal stripping, merg¬ 
ers, as well as transient effects in their p roperties. Our tree s 
were built using the algorithm described in ijiang et al.l (l2014l l. 


^ We checked that the mean difference between the bulk velocity of 
DM particles and star particles in the inner 30 kpc for the subhaloes 
with 8.77 < Mstar[MQ] < 10.77 is smaller than 10 km s“®. 
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Table 2. Number of central and satellite EAGLE galaxies for four 
stellar mass bins. In parentheses we provide the percentage of EA¬ 
GLE galaxies with a counterpart in DMO. 


logiQ(Mstar[M©]) 

EAGLE 

Central 

Satellites 

8.77-9.27 

3954 (92 %) 

3475 (68 %) 

9.27-9.77 

2550 (92 %) 

2068 (74 %) 

9.77- 10.27 

1551 (94 %) 

1247 (76 %) 

10.27- 10.77 

968 (92 %) 

652 (80 %) 


employing 201 snapshots for DMO and 29 snapshots for EA¬ 
GLE. In both simulations the output times were approxi¬ 
mately equally spaced in log(a) for a > 0.2, where a is the 
cosmic scale factor. 

Finally, we note that to avoid problems related to subhalo 
fragmentation and spurious structures, we remove from our 
analysis satellites without resolved progenitors. 


2.3 EAGLE and DMO crossmatch 

EAGLE and DMO share the same initial conditions, so we 
expect roughly the same non-linear objects to form in both 
simulations. This is a powerful feature: it enables us to identify 
the EAGLE galaxy that a given DMO subhalo is expected to 
host, and thus, to probe directly the assumptions of SHAM. 

In practice, we link DMO sub haloes to EAGLE gal axies 
follo wing the proce s s des cribed by ISchaller et al.l (l2015l l ; see 
also IVelliscig et al.l ll2014l ~) . For every subhalo in EAGLE we 
select the 50 most-bound DM particles. If we find a subhalo in 
DMO which shares at least half of them, the link is made. We 
confirm the link if, repeating the same process starting from 
each DMO subhalo, we identify the same pair. We only search 
the pairs with more than 174 DM particles in each simulation, 
which corresponds to a minimum halo mass of 2 x 10® Mq in 
DMO. This procedure yields a catalogue of 13687 galaxies with 

10«-^7 < Mstar[MQ] < 

In Table [2] we list the fraction of successfully matched 
centrals and satellites, for four stellar mass bins. Overall, the 
match is successful for more than 90 % of centrals in EAGLE, 
independently of their mass. The success rate drops to 68 — 
80 % for satellites, with low-mass satellites showing the lowest 
percentage. This is a consequence of the finite mass resolution 
of the simulations (see also Appendix A), the mass loss due to 
interactions with the host halo, small differences in the timing 
at which mergers happen, and the high-density environment 
in which they reside. 


3 SUBHALO ABUNDANCE MATCHING 


expected to be tightly correlated with the DM content of the 
host halo (contrary to e.g. the star formation rate, which could 
be more stochastic). The subhalo property should capture the 
time-integrated mass of gas available to fuel star formation, 
but there is no consensus as to what the most adequate sub¬ 
halo property ifl 

A commonly used property in SHAM is the maximum 
of the radial circular velocity profile (which can be regarded 
as a measure of the depth of the potential well of a subhalo) 
defined at a suitable time: 


Kirc(z) = max|^-\/ GM{z, < r)/rj. 


( 2 ) 


where M(< r) is the mass enclosed inside a radius r. 

There are several reasons to prefer circular velocity over 
halo mass in SHAM: i) it is typically reached at one tenth 
of the halo radius, so it is a better characterization of the 
scales that we expect to affect the galaxy most directly; ii) 
it is less sensitive to the mass stripping that a halo/subhalo 
experiences after it has been accreted by a larger object 
200^ Knayrtgov et ahl [20041 : iNagai fc Kravts^ 
I 2 OO 5 I : IPenarrubia et al.ll2008lj : hi) it does not depend on the 
definition of halo/subhalo mass. 

However, the I4irc(2) of DM objects are complicated func¬ 
tions, which can display non-monotonic behaviour in time, 
with transient peaks and dips, and that are subject to envi¬ 
ronmental and numerical effects. This is illustrated by Fig. [T] 
which shows examples of the evolution of the circular veloc¬ 
ity for two central (left panel) and two satellite (right panel) 
subhaloes in DMO. These subhaloes are selected to illustrate 
the evolution of the maximum circular velocity in typical cen¬ 
trals and satellites. We can see that there is no obvious time 
at which I4irc(z) should be computed for an accurate SHAM. 

We will implement four “flavours” of SHAM, each using 
Wire( 2 ) defined at a different time: Umax, Upeak, Unfaii, and 
Ueiax (each marked by horizontal lines and arrows of a differ¬ 
ent colour in Fig. [TJ. The first three flavours have been used 
previously in the literature, whereas the fourth is first used in 
this work. We discuss the four SHAM flavours next. 


1) Umax is the maximum circular velocity of a subhalo at the 
present time, Ucirc(2: = 0). 

2) Uinfaii is the maximum circular velocity at the last time a 
subhalo was identified as a central. 

3) Upeak is the maximum circular velocity that a subhalo has 
reached. 

4) Ureiax is the maximum circular velocity that a subhalo has 
reached during the periods in which it satisfied a relaxation 
criterion. The criter ion we use is Atform > tcross, following a 
similar approach to iLudlow et al.l (l2012l l . The motivation is 
that after a major merger, DM haloes typically need of the 
order of one crossing time (feross = 2r2oo/U2oo = 0.2/H(z)) to 
return to equilibrium. Thus, we define Atform as the look-back 
time from a given redshift Zi to the redshift where the main 


In this section we discuss different SHAM flavours and their 
implementation in DMO. 

3.1 SHAM flavours 

The main assumption of SHAM is that there is a one-to-one 
relation between a property of a DM subhalo and a property of 
the galaxy that it hosts. The galaxy property is usually taken 
to be the stellar mass (or K-band luminosity), since this is 


^ Pr o perties used in the l iterature include AIdm llVale &: OstrikeJ 
120041 : fshankar et al.l [200(11 . maximum circular velocity at present 
for centrals and at infall for satellites ilConrov et al.|| 2Q06[i. virial 
mass for centrals and mass at infall for satellites llWetzel fc White! 
I2OI0I : iBehroozi et alJl2010h . virial mass for centrals and the high¬ 
est m ass along the merger history for satellites dMoster et al.l 
l2010h. and highest circular velocity along the merger history 
llTruiillo-Gomez et al.ir201ll : iNuza et al.ll201^ (see I Reddick et al.l 
120131 . for a detailed comparison between the previous properties). 
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Figure 1. Evolution of the maximum circular velocity of two central (left panel) and two satellite (right panel) subhaloes in DMO. The black 
solid lines show the circular velocity, the grey coloured areas the periods during which the subhaloes are satellites, and the blue coloured 
regions the intervals during which the subhaloes satisfy our relaxation criterion. Horizontal lines highlight the circular velocity at 2 : = 0 
(Fmax, red dashed line), the circular velocity at the last infall for satellites and Vhiax for centrals (Vinfaii, orange dotted line), the maximum 
circular velocity that a subhalo has had (Fpeaki green dot-dashed line), and the maximum circular velocity that a subhalo has reached while 
it satisfied our relaxation criterion (Frelaxj blue long dashed line). 


progenitor of a subhalo reached 3/4 of the subhalo mass at Zi 
(we tested other definitions for the formation time, from 4/5 
to 1/2, finding roughly the same results). The periods during 
which this condition is satisfied are shown as blue shaded re¬ 
gions in Fig.[T] We can compute K-eiax for more than the 99% 
of the subhaloes in DMO and we remove the subhaloes where 
V/eiax cannot be calculated. We cannot compute Keiax for the 
full sample because this quantity is not defined for subhaloes 
younger than one crossing time. 

Although Wire should generally not be affected by the 
stripping of the outer layers of a halo, in the right panel of Fig. 
[T]we can see that it does still evolve for satellites. The decrease 
in Vciic{z) after infall is in large part due to tidal heating, a pro¬ 
cess which reduces the density in the inner regions of the sate l- 
lites llGnedirJl2003l : iHavashi et al.ll2003 : iKravtsov et ai1l2004h . 
The tidal heating is related to the position of a subhalo inside 
its host halo, being maximum at pericentric passages. We can 
see an extreme case of tidal interactions in the top right panel, 
where this subhalo has lost more than 99% of its mass since 
it became a satellite. After the last infall at 1 + z ^ 2.3 (grey 
shaded region), the value of Wire decreased by about 80% in a 
series of steps {z ~ 1, 0.5, 0.3, 0.1, 0.05, and 0), which indeed 
coincide with pericentric passages. This implies that satellite 
galaxies have lower values of Wnax than central galaxies of the 
same stellar mass. Thus, a Wnax-based SHAM will underesti¬ 
mate the fraction of satellites. 

Tidal heating and stripping affect not only satellites but 
also “backsplash satellites”, i.e. centrals at z = 0 which were 
satellites in the past, reducing their circular velocity while they 


were inside a larger halo. An example of this process is shown 
in the bottom left panel of Fig. [T] where the circular velocity 
of this subhalo was reduced by about 7% in the period during 
which it was a satellite (while the mass was reduced by 50%). 

Wnfaii is less affected by these problems. Unfortunately, 
this parameter also underestimates Wire for satellites because 
tidal heating starts t o act even before a sa t ellite is accreted by 
its future host halo llKravtsov et al.ll2004l : IWetzel et aLllioi^ . 
I 2 OI 4 I ). This can be seen in the top (bottom) right panel Fig. 
m where the value of Wire starts to decrease at 1 + z ^ 3.4 
(1 -|- z ~ 4.4) while the subhalo is accreted at 1 -|- z ~ 2.4 
(1 + z- 1.2). 

Additionally, there are new problems associated 
with Wnfaii- The f i rst concerns satell ite-satellite mergers 
(I Angulo et al.l l2009l : IWetzel et al.l l2009l ). which should in¬ 
crease the mass of stars in a satellite but this is not captured 
by Wnfaii- The second is related to the definition of Wnfaii; it 
is not clear whether we should consider Wnfaii as the circular 
velocity at the last infall or at previous accretion events. We 
can see in the bottom right panel of Fig. [T] a satellite which 
has undergone several alternating central/satellite periods, 
decreasing in total its circular velocity by 20% and its mass 
by 70%. 

An alternative solution is provided by Wpeak since it can 
capture all episodes during which the subhalo grows, and it is 
not affected by a reduction of Wdre due to environmental ef¬ 
fects. However, this definition similarly has its own problems. 
During periods of r apid mass accret ion, DM haloes are usually 
out of equilibrium (iNeto et al.ll2007l ). In particular, during ma- 
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Figure 2. Relation between Mgtar of EAGLE galaxies and SHAM 
flavours for the corresponding DMO subhaloes. The grey scale rep¬ 
resents the number of subhaloes per pixel, which ranges from 1 
(light grey) to 100 (black). Blue and red contours mark the regions 
containing 68% and 95% of the distribution, respectively. 


jor mergers the concentration can be artificially high (this is a 
maximum compression phase of halo for mation), which tem¬ 
porarily increases th e value of Vchc (e.g. iLudlow et al.llioid : 
iBehroozi et al.ir2014l ). This effect is responsible for the peaks 
seen in all four panels of Fig. [T] Although at any given time 
it is rare to find a halo in this phase, the value of ypeak will 
likely be assigned during one of these phases, and will thus 
overestimate the depth of the potential well. In addition, this 
effect makes the predictions of V^peak dependent on the number 
and intervals of the output times of a given simulation. 

Here we propose a new measure, Hreiax, designed to over¬ 
come the problems of Hmax, Vinfaii, and Fpeak- It is marked by 
arrows and horizontal lines of blue colour in Fig. [T] VJ-eiax is 
insensitive to tidal heating, transient peaks, and consistently 
defined for centrals, satellites, and backsplash satellites. We 
emphasise that it is desirable to eliminate the aforementioned 
problems because they represent changes in Vdic which are 
not expected to correlate with the growth history of Matar, 
and will thus add extra noise to SHAM. 

We now take a first look at the performance of each SHAM 
flavour. Fig. [2] shows the relation between each of the four 
properties described above for DMO subhaloes, as indicated 
by the legend, and Matar of their galaxy counterpart in EA¬ 
GLE (see i l2.3l) . All panels show a tight correlation, which 
supports the main assumption of SHAM, that the relation 
between stellar mass and SHAM parameters should be mono¬ 
tonic. However, the scatter in the relation is different in each 
panel because of the effects discussed in this section: Vmux 
shows the largest and V(eiax the smallest dispersion. In the 
next sections we will quantify the performance of each SHAM 
flavour in detail. 


Table 3. Parameters of the functions that fit the 
mean, fi, and standard deviation, a, of the model for 
P(logjQ Matar [Mq]| logjo Vi [km s“^]). The unit of V is km s“^. 


(7 = a-I-6 logtQ V tJ. = a + bta,n ^(c + dlogtQV) 
a b a b c d 



0.60 

-0.20 

7.03 

5.52 

-1.84 

1.12 

Knfall 

0.53 

-0.16 

7.01 

5.52 

-1.84 

1.12 

^^eak 

0.55 

-0.16 

7.70 

5.42 

-1.89 

1.05 

^elax 

0.59 

-0.20 

7.14 

5.55 

-1.86 

1.10 



40 50 70 100 200 300 

V, [km/s] 


Figure 3. Standard deviation (top panel) and mean (bottom 
panel) of the Gaussians used to fit PDFs for logjQ Matar [Mq]. For 
clarity, we have shifted the a (fi) of Vmax, Vinfaii, and Vpeak by 
+0.3, +0.2, and +0.1 (+3, +2, and +1), respectively. The best fit¬ 
ting functions are shown by coloured lines, and the values of the 
respective parameters are given in Table [3] 


3.2 Implementation 

The first step to implement the four flavours of SHAM is to 
compute P(logto Matarl logjQ V): the probability that a sub¬ 
halo hosts a galaxy of mass Matar given a certain value of the 
SHAM flavour V. We compute this quantity as follows: 

1) We select subhalo-galaxy pairs from the matched cata¬ 
logues (see with logiQ Matar [Mq] > 7 and divide them 
according to logjQ V in bins of 0.05 dex. We discard bins with 
fewer than 100 objects. 

2) For each logjQ V bin, we compute the distribution 
of logjQ Matar and fit it by a Gaussian function, G ^ 
exp(—0.5(logj^g Matar — /i)^/(cr)^), where jj, is the mean and 
a the dispersion. 

3) We fit a linear function, cr = a + 6 logj^g V, to (T(logj^Q V) 
and an arctangent, fj, = a + ti tan“^(c + dlogj^gV), to 
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/i(logjQ Vi). The values of the best-fit parameters are given 
in Table [3] and the quality of the fit can be judged from Fig. 

El 

4) Using these functions, we model P(logjQ Mstar | log^g U) 
as G[/r(logio Ui),cr(logio V)]. 

Our second step is to assign a value of Mstar to every sub¬ 
halo in DM0 (not only those with an EAGLE counterpart) 
by randomly sampling P(logjQ Mstar | logjQ Vi). This creates a 
catalogue that captures the appropriate stochastic relation be¬ 
tween Mstar and the parameter Vi. If the relation for EAGLE 
galaxies were also stochastic with respect to the underlying 
density field, then we would expect these catalogues to have 
the same clustering properties as EAGLE. 

We note we have verified that the resulting stellar mass 
function agrees closely with that of the EAGLE simulation. 
However, to ensure identical mass functions and thus to make 
subsequent comparisons more direct, we assign to each SHAM 
galaxy the value of Mstar of the EAGLE galaxy at the same 
rank order position. Hereafter, we will refer generically to the 
galaxy catalogues created in this way as “SHAM galaxies” and 
specifically to the galaxy catalogues generated by a particular 
SHAM parameter as “Vi galaxies”. 

We compute 100 realizations of SHAM for every flavour 
using different random seeds. The results presented in the fol¬ 
lowing sections are the mean of all the realizations and the 
errors the standard deviation. 


4 RESULTS 

In this section we test how well SHAM reproduces different 
properties of EAGLE galaxies. In particular, we will explore 
the predicted stellar mass of individual subhaloes (S31), the 
halo occupation distribution (' >14.2.111 . the number density pro¬ 
files inside haloes (' >14.2.211 . the clustering in real and redshift 
space ( >14.3.11 >14.3.21) . and the assembly bias ( >14.3.311 . 

We present results for 4 bins in stellar mass, as indicated 
in TableO This range was chosen to include only well sampled 
and well resolved galaxies (comprised of more than 230 star 
particles) and bins with enough galaxies to allow statistically 
significant analyses (more than 1600 galaxies). 


4.1 Correlation between Mstar and Vi 

In >j3]we discussed that in some cases Umax, Unfaii, and Upeak 
are unintentionally affected by physical and numerical effects, 
which degrades the performance of SHAM. We also argued 
that VVeiax does not present any obvious problem and thus 
we expected it to be the SHAM flavour that correlates most 
strongly with Mstar. This was qualitatively supported by Fig. 
(2] We start this section by quantifying these statements using 
the Spearman rank correlation coefficient between the Mstar of 
EAGLE galaxies and the SHAM flavours of DMO subhaloes. 

The Spearman coefficient measures the statistical depen¬ 
dence between two quantities and is defined as the Pearson 
correlation coefficient between the ranks of sorted variables. 
A value of unity implies a perfect correlation, which in our 
case means that the stellar mass of a galaxy is completely 
determined by its SHAM parameter, i.e. that the relation is 
monotonic and thus without scatter. A value close to zero 
means that the relation between the SHAM parameter and 
Mstar is essentially random. 


In Fig. 2] we show the Spearman coefficient for the corre¬ 
lation between Mstar and each of our four SHAM parameters. 
We divide our sample into three groups: i) present-day central 
subhaloes that have been centrals for their entire merger his¬ 
tory except for at most 4 snapshots (centrals, left panel), ii) 
present-day central subhaloes that have been satellites more 
than 4 snapshots in the past (backsplash satellites, central 
panel), and hi) present-day satellites (satellites, right panel). 

In general, we find that the correlation increases with 
Mstar, that it is stronger for centrals than for satellites, and 
that Ureiax displays the strongest correlation with Mstar- Re¬ 
garding the different SHAM flavours, we find that i) for cen¬ 
trals Upeak produces the weakest correlation, ii) for satellites 
Umax shows the weakest correlations, and iii) Unfaii and Ueiax 
consistently display the best performance, with Ueiax showing 
a slight improvement over Unfaii for satellites. 

Our results can be understood from the discussion in §2. 
For centrals. Unax and Unfaii are identical by construction and 
they are close to the value of Ueiax because Uirc tends to 
increase with decreasing redshift for centrals. On the other 
hand, Upeak is usually established while Uirc is temporarily 
enhanced as a result of merger events. For backsplash satel¬ 
lites, Umax and Unfall are also identical by construction, but, 
unlike Ueiax, they are insensitive to their more complicated 
history, which explains their weaker correlation with Mstar- 

Finally, satellites display the weakest correlations, with 
Umax presenting the lowest correlation coefficient. This is be¬ 
cause Uirc decreases soon after infall, whereas the stellar mass 
can still grow until the gas is completely exhausted (although 
tidal forces may strip stars). Unfaii alleviates this problem but 
the interaction between the satellites and their host haloes 
starts before the satellites reach the virial r adii of their host 
haloes dHavashi et al.l[2003l : lBahe et al.ll2013ll . Because of this, 
Ueiax better captures the expected evolution in Mstar- Lastly, 
Upeak is still affected by the out-of-equilibrium artefacts dis¬ 
cussed above. 

In sections §4.2 and §4.3 we will investigate how the dif¬ 
ferent correlations impact the predictions for the clustering of 
EAGLE galaxies. 

4.2 The properties of SHAM galaxies 

To predict the correct galaxy clustering, SHAM has to asso¬ 
ciate galaxies with the correct subhaloes, to allocate the right 
proportion of centrals and satellites, and to place galaxies fol¬ 
lowing the correct radial distribution. Therefore, before pre¬ 
senting our results regarding the clustering, we will explore 
these ingredients separately. 

4 . 2.1 Halo occupation distribution 

The panels of Fig. [S] show the distribution of host halo masses 
for centrals and satellites in different Mstar bins. The left 
(right) curves display the number of centrals (satellites) in 
haloes of a given mass multiplied by the linear bia^ expected 
for haloes of that mass and normalized by the total number 
of subhaloes. The quantity plotted can be interpreted as the 

® We calculate the linear bias as 6 = 1 -(- dMo fc Whitd[T996ll , 
where 5c ~ 1.69 is the critical linear overdensity at collapse and 
= 5clcr{M, z) is the dimensionless amplitude of fluctuations which 
produces haloes of mass M at redshift z. 
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Figure 4. The Spearman rank correlation coefBcient between the Mgtar of EAGLE galaxies and each of four parameters used to perform 
SHAM. The subhaloes are divided into three categories: centrals (left panel), backsplash satellites (central panel), and satellites (right panel), 
see the main text for more details. The fraction of objects in each category is given in the legend. The red (orange) points are displaced 
horizontally by -0.03 (-1-0.03) dex for clarity. 


relative contribution to the large-scale clustering from galax¬ 
ies hosted by haloes of different mass. In each panel, the 
histogram presents the results for EAGLE galaxies and the 
coloured lines the results of the SHAM implementations de¬ 
tailed in §3.2. For EAGLE galaxies we employ the M 200 of 
their host halo DM0 counterpart, which makes this plot less 
sensitive to baryonic effects that might systematically change 
the mass of DM haloes. For the 5.1% of EAGLE galaxies 
hosted by a halo without DMO counterpart, we multiply M 200 
by /dm = 1 — (Hb/Hm) = 0.843. This is the average difference 
in M 200 between the hydro dynamic and g r avity- only EAGLE 
simulations, as reported bv ISchaller et al. (l2015l ). 

Firstly, we see that using Vhiax as SHAM parameter re¬ 
sults in shifted M 200 distributions and an underprediction, of 
about 30 %, of the number of satellites for all Mstm bins. This 
is a consequence of the reduction of Fmax for satellites after be¬ 
ing accreted, which introduces centrals hosted by lower-mass 
haloes into the SHAM sample. 

The distribution of EAGLE galaxies is closely reproduced 
by the other SHAM implementations, for all stellar mass bins. 
The distributions for central galaxies have almost identical 
shapes and peak at roughly the same host halo mass. Note, 
however, that compared to Einfaii and VJ-eiax, Epeak yields sys¬ 
tematically broader distributions for centrals. This is consis¬ 
tent with the differences in the correlation coefficient shown 
in the left panel of Fig. 3] 

Additionally, the Vinfaii, Epeak, and Keiax satellite frac¬ 
tions agree to within ~ 5% with those in EAGLE, although 
they are systematically lower, as shown in Table ID How¬ 
ever, for the two lowest stellar mass bins, there is a slight 
overestimate of the number of satellites in haloes of mass 
A /200 > 10^® M 0 , and a somewhat larger underestimate for 
haloes of mass A /200 < lO^^M©, as Table [5] shows. Since the 


Table 4. Satellite fraction for EAGLE and SHAM galaxies using 

Lmax, Vjnfalb L^eak) and V(-elax* 


logio(Mstar[MQ]) 

Vmax 

Vinfall 

Vpeak 

Vrelax 

EAGLE 


Satellite fraction 


8.77-9.27 

0.32 

0.43 

0.46 

0.45 

0.47 

9.27-9.77 

0.30 

0.42 

0.44 

0.43 

0.45 

9.77- 10.27 

0.28 

0.40 

0.41 

0.41 

0.44 

10.27- 10.77 

0.25 

0.37 

0.38 

0.37 

0.40 


difference is greater for the high-mass haloes, the overall satel¬ 
lite fraction is underestimated. We will analyse the repercus¬ 
sion of these small differences in forthcoming sections. 

4-2.2 Radial distribution of satellites 

Fig. [6] shows the spherically averaged number density profiles 
of satellite galaxies with 8.77 < logA/star [Mq] < 10.77, nor¬ 
malized to the mean number density within r 2 oo- We show re¬ 
sults for galaxies inside haloes in three DMO halo mass bins, 
as indicated by the legend. The data points represent the pro¬ 
files measured using EAGLE galaxies, whereas coloured lines 
display the stacked results for SHAM galaxies. For compari¬ 
son, we also plot the best-fit NFW profile to the EAGLE data, 
which appears to be a good description over the range of scales 
probed. 

Given the statistical uncertainties, the number density 
profiles of EAGLE and SHAM galaxies agree reasonably well 
with the exception of Vmax- For Vmax, the differences are 
greater, it predicts shallower profiles and a lack of objects in 
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Figure 5. The distribution of host halo masses, M 200 , for SHAM and EAGLE galaxies in different Mstar bins. Histograms show the results 
for EAGLE galaxies and coloured lines for different SHAM flavours, as detailed in §3.2. The left (right) curves display the number Ni of 
centrals (satellites) in haloes of a given mass multiplied by the linear bias b and normalized by the total number of subhaloes Ntot- Therefore, 
the y-axis reflects the relative contribution of galaxies in different host halo mass bins to the large-scale correlation function. Note that for 
EAGLE galaxies we employ the M 200 of tho DMO counterpart, which makes our comparison less dependent on the baryonic processes which 
might alter the mass of the host halo. 


the inner parts compared to EAGLE. This is consistent with 
the effects described previously: the inner parts of haloes ex¬ 
perience large tides and are also populated by the oldest sub¬ 
haloes. In contrast, on scales r > 0.1 Mpc, the Vpeak, Vlnfaii 
and Ereiax profiles are consistent with the measurements from 
EAGLE for all three halo mass bins. 


4.3 Galaxy clustering 

We are now in the position to investigate the performance of 
SHAM in predicting the clustering of galaxies. We first dis¬ 
cuss the two-point correlation function (2PCF) in real-space 
(> 14.3.ID . then the monopole of the redshift-space correlation 
function ( >14.3.2D . and we end with an exploration of assembly 
bias in both EAGLE and SHAM (gSS]). 

We compute the 2PGF, ^(r), by Fourier transforming the 
galaxy number density field, which is a faster alternative to 
a direct pair count. We provide details of the procedure in 
Appendix B. We estimate the statistical uncertainties in the 


2PG F of EAGLE galaxie s using a spatial jack-knife resampling 
(e.g.. IZehavi et 311120051 ). Summarizing, we divide the simula¬ 
tion box in 64 smaller boxes and then we compute 64 2PGFs 
removing one of the small boxes each time. The statistical 
errors are the standard deviation of the 64 2PGFs. On the 
other hand, we assign errors to the 2PCF of SHAM galaxies 
by computing the standard deviation of 100 realizations for 
each SHAM flavour. 


4-3.1 Real-Space Correlation Function 

In Fig. [3 we compare the 2PCF for EAGLE galaxies (black 
solid line) with results of stacking 100 realizations of SHAM 
for different stellar mass bins. In the bottom panel of each 
subplot, we display the relative difference of the 2PCFs of 
each Vi galaxy sample and EAGLE (A^i = ^i/^EAGLB — !)• 
Fig. [7] shows that Hmax clearly underestimates the clus¬ 
tering on small scales, which is consistent with the under¬ 
estimation of the satellite fraction discussed earlier. A lower 
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Table 5. Number of satellites as a function of Mstar and M 200 for 
EAGLE and SHAM galaxies using V^relax- 


(-^star [M 0 ]) 

logio(M2oo[M0]) 

EAGLE 

^elax 





N. of satellites 

8.77 

- 9.27 

11.6 - 

12.6 

1060 

780 



12.6 - 

13.6 

1274 

1328 



13.6 - 

14.6 

945 

1057 

9.27 

- 9.77 

11.6 - 

12.6 

584 

444 



12.6 - 

13.6 

834 

838 



13.6 - 

14.6 

633 

695 

9.77 - 

- 10.27 

11.6 - 

12.6 

293 

208 



12.6 - 

13.6 

495 

482 



13.6 - 

14.6 

459 

452 

10.27 

- 10.77 

11.6 - 

12.6 

65 

61 



12.6 - 

13.6 

280 

253 



13.6 - 

14.6 

307 

292 



r [Mpc] 


Figure 6. The radial distribution of galaxies with 8.77 < 
^ogio('^star [M 0 ]) < 10.77, inside haloes of mass , 

— 1 O^^ *^M 0 (displaced by +1 dex), and more massive than 
(displaced by +2 dex). We present the spherically av¬ 
eraged number density, normalized to the mean number density 
within the host halo. Black symbols show the results for EA¬ 
GLE galaxies, whereas coloured lines show stacked results from 
100 realizations of SHAM using Vmax, Mnfall, V'peak, and V)-elax- 
The error bars indicate the 1 cr scatter for EAGLE galaxies. 
The shaded region marks the standard deviation of 100 realiza¬ 
tions of SHAM using Hrelax- We overplot the NEW profiles (with 
Ta = 0.81, 0.29, 0.21 Mpc from the most to the least massive halo 
sample) that best fit the EAGLE data points shown. 


satellite fraction also implies a lower mean host halo mass and 
a smaller bias, which explains the underestimation of the cor¬ 
relation function on larger scales. 

On the other hand, Vinfaii, V^peak, and l^eiax galaxies agree 
very closely with the EAGLE measurements. On scales greater 
than 2 Mpc, all three flavours are statistically compatible with 
the full hydrodynamical results. We note that the small differ¬ 
ences are of the same order as the variance introduced by dif¬ 
ferent samplings of P(logiQ Mstar | logjo ^)- For the two higher 
stellar mass bins, the statistical agreement is extended down 
to 400 kpc. 

For the two lower stellar mass bins, we measure statis¬ 
tically significant differences on small scales, especially for 
Vpeak and Keiax galaxies. The SHAM clustering appears to be 
20 — 30 % high, which could originate from either more con¬ 
centrated SHAM galaxy distributions inside haloes, or from 
an excess of satellite galaxies. At first sight, the latter ex¬ 
planation appears to contradict our previous finding that the 
satellite fraction is underpredicted by SHAM. However, the 
small-scale clustering will be dominated by satellites inside 
very massive haloeqj, whose number is indeed overpredicted 
(c.f. Table O. 

Additionally, Fig. [5] showed that Vinfaii resulted in the 
same underestimation of the overall satellite fraction as Vpeak 
and Keiax but a somewhat smaller satellite fraction in the high 
halo mass range. This explains the weaker small-scale cluster¬ 
ing seen in Fig. [7] and consequently the slightly better agree¬ 
ment with EAGLE. Note, however, that the smaller number of 
satellites could be caused by the fact that Vcirc decreases even 
before accretion, especially near very massive haloes. This sug¬ 
gests that the apparent improved performance of Vinfaii could 
be simply a coincidence. We will investigate these hypotheses 
in §5. 

4-3.2 Redshift-space Correlation Function 

Fig.[8]is analogous to Fig.[7|but for the redshift-space 2PGFs. 
We compute 2PCF in redshift-space because they are more 
directly comparable with observations than the 2PCF in 
real-space. We transform real- to redshift-space coordinates 
(r and s, respectively) in the plane-parallel approximation: 
s = r -b (1 -b z){'v ■ C)/H{z), where v the peculiar velocity, 
H{z) is the Hubble parameter at redshift z, and k is the unit 
vector along the z direction. On scales greater than 6 Mpc, 
this transfor mation enhan ces the clustering signal due to the 
Kaiser effect (lKaiserlll987l ). On smaller scales, motions inside 
virialised structures produce the so-called finger-of-god effect, 
smoothing the correlation function. 

The differences between the SHAM flavours are qualita¬ 
tively similar in real and redshift space: Vmax underpredicts 
the clustering on all scales and for all Mstar bins, the remain¬ 
ing SHAM flavours are statistically compatible with EAGLE 
on scales > 1 Mpc, and the clustering amplitude of Vinfaii is 
systematically below that of Veiax and Vpeak- On the other 
hand, compared with the real-space 2PCFs, there is better 
agreement between Veiax, Vpeak and EAGLE on small scales 
for the two lowest mass bins. This improvement is likely a re¬ 
sult of two effects. First, a considerable fraction of close pairs 

^ For instance, in the case of the small-scale clustering of galaxies 
in the lowest stellar mass bin, the contribution of satellites inside 
haloes with M 200 > 10^® Mq is almost an order of magnitude larger 
than that of satellites in haloes with M 200 < 10^® Mq. 
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Real space clustering 
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Figure 7. Real-space two-point correlation function for galaxies in different stellar mass bins. The black solid line shows the clustering in 
EAGLE, with the grey shaded region the jackknife statistical error. The coloured lines show the clustering predictions of SHAM using Vjnax 
(red dashed), Finfall (orange dotted), Fpeak (green dot-dashed), and Vj-elax (blue long dashed). The error bars indicate the standard deviation 
of 100 realizations of SHAM for each flavour. In the lower half of each panel we display the relative difference of SHAM with respect to 
EAGLE (A^i = Ci/^BAGLE ~ !)• Note that the green and orange lines are slightly displaced horizontally for clarity. Using Vj-eiax SHAM 
parameter, we retrieve the clustering of EAGLE galaxies to within 10% on scales greater than 2 Mpc. 


in redshift space will be much further apart in real space, 
and hence better modelled by SHAM. Second, the incorrect 
HOD that SHAM galaxies show can be compensated by a 
stronger smoothing of the 2PCF: a greater number of satel¬ 
lites in high-mass haloes would increase the small-scale clus¬ 
tering, but these satellites would also have a higher velocity 
dispersion. 

If the agreement between SHAM and EAGLE galaxies 
were reached because of the cancellation of different sources 
of error, then this would impact other orthogonal statistics, 
for instance, the strength of the so-called assembly bias (other 
examples are the high-order multipoles of the redshift space 
2PCF). We explore this next. 


4-3.3 Assembly bias 

Assembly bias generically refers to the dependence of halo 
clustering on any halo property other than mass, such as for- 
mation time, concentration, or spin (see, e.g.. lGao et al ] l2005l: 


iGao fc Whitel[^^7^ . It has been robustly detected in DM sim¬ 
ulations, but it is not clear what is the effect of assembly bias 
on galaxy clustering. This is because a given galaxy sample will 
typically be a mix of haloes of different masses and properties. 
Although the strength of the effect depends on the assump¬ 
tions of the underlying galaxy formation model, semi-analytic 
galaxy formation models an d SHAM both suggest that assem¬ 
bly b i as is indeed import ant (iGroton et al.|[20OT : IZentner et al.l 
I 2 OI 4 I : lilearin et ^l2014l ). To our knowledge, this issue has not 
yet been investigated with hydrodynamical simulations. 

In this section we explore whether assembly bias is present 
in EAGLE and whether the different SHAM flavours are able 
to predict its amplitude. To quantify the effect, we will com¬ 
pare SHAM and EAGLE 2PCFs to those measured in shuffled 
galaxy catalogues, which are built following the approach of 
iGroton et al.l ( 2007h : 

1) We compute the distance between each satellite galaxy 
and the centre-of-potential (GOP) of its host halo. This dis¬ 
tance is by definition zero for central galaxies. 
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Figure 8 . Same as Fig.[7]but for correlation functions computed in redshift space. The agreement between the clustering of EAGLE galaxies 
and V^eak and K-elax galaxies is even better in redshift space than in real space for the two lowest stellar mass bins. The main reason of 
the improvement on small scales is that most of the galaxies separated by those scales in redshift space are at larger distances in real space, 
where Fpeak nnd Fj-eiax galaxies accurately reproduce the clustering of EAGLE galaxies. 


2) We bin haloes according to M 200 using a bin size of 0.04 
dex. We verified that our results are independent of small 
changes in the bin widths. 

3) We randomly shuffle the entire galaxy population between 
haloes in the same mass bin. 

4) Finally, we assign a new position to each galaxy by moving 
the galaxy away from the COP of its new halo by the same 
distance that we calculated in 1). 


Fig-Elshows the mean relative difference between 100 real¬ 
izations of the shuffled catalogues and the original for different 
bins of stellar mass. The black solid lines display the results 
for EAGLE galaxies and the coloured lines for SHAM galax¬ 
ies. Since the position of galaxies/subhaloes is independent 
of the environment in the shuffled catalogues, their clustering 
should depend exclusively on the host halo mass. Therefore, 
any deviations from zero in Fig. [9] can be attributed to the 
assembly bias. Note that on small scales the ratio goes to zero 


by dehnition since the shuffling procedure does not alter the 
clustering of galaxies inside the same hal(0. 

We can clearly see that all shuffled catalogues underesti¬ 
mate the clustering amplitude for r > 1 Mpc. In the case of 
EAGLE galaxies, the differences are 20% on scales greater 
than 2 Mpc, roughly independent of stellar mass. This implies 
that assembly bias increases the clustering amplitude expected 
from simple HOD analyses by about 1/0.8 = 25 %. 

For SHAM galaxies, the effect goes in the same direc¬ 
tion but is somewhat weaker for all stellar masses (although 
it is more statistically significant for the lowest mass bins). 
This can be interpreted as SHAM lacking some environmen¬ 
tal dependence of the relation between Matar and Vi. Likely 
candidates are tidal stripping of stars, and/or tidal stripping, 
harassment, and starvation happening before a galaxy is ac¬ 
creted into a larger DM halo. These effects are important be- 

® Note that our findings would remain nearly the s ame if instead 
we shu ffled centrals and satellites separately following IZentner et al.l 
||2014) . This is because centrals and satellites with the same Mgtar 
rarely reside in the same halo (see Fig. EJ. 
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Figure 9. The relative difference of the 2PCFs of galaxies to that of a catalogue where galaxies are shuffled among haloes of the same mass 
(Agi = — 1 , see 94.3.31 for more details). We adopt the same labelling as in Fig. [7] The grey shaded areas show the standard 

deviation after applying the shuffling procedure 100 times for EAGLE galaxies. 


cause the efficiency with which a given halo creates stars will 
depend on the large-scale environment. We will return to these 
issues in the next section. 

Before closing this section, it is interesting to note the 
particular case of Knfaii, which was the SHAM flavour that 
agreed best with the real space 2PCF of EAGLE data. The 
fact that the strength of the assembly bias is roughly a factor 
of two smaller than in EAGLE supports the idea that the pre¬ 
vious agreement was partly coincidental. Since Vinfaii will be 
reduced near large haloes due to interactions experimented by 
subhaloes before being accreted, the number of satellites will 
decrease and the 2PCF will decrease on small scales. However, 
this will likely occur for the wrong haloes, which will result in 
a misestimated amplitude for the assembly bias. 


5 TESTING THE ASSUMPTIONS 
UNDERLYING SHAM 

In the previous section we showed that SHAM reproduces the 
clustering of EAGLE galaxies to within 10 % on scales greater 
than 2 Mpc and the corresponding assembly bias reasonably 


well. However, small differences remain, most notably the clus¬ 
tering on small scales and the strength of assembly bias. In 
this section, we will directly test four key assumptions behind 
SHAM with the aim of identify the likely cause of the dis¬ 
agreement. Unless stated otherwise, we will employ Keiax- 

5.1 Assumption I: The relation between Mstar and 
Vi is independent of redshift 

One of the main assumptions in our implementation of SHAM 
is that Mstar depends on the value of Ueiax, but not on the red- 
shift at which Ueiax was acquired. If this were not the case, 
we would expect an additional dependence on, for instance, 
the formation time of DM haloes. Such a redshift dependence 
would be particularly important for satellites, since on aver¬ 
age they reach their value of Ueiax at higher redshifts than 
centrals. 

To test this assumption, we cross-matched the DM0 and 
EAGLE catalogues at redshifts 2 : = [0,0.1,0.27, and 0.5]. 
We do this by assuming that the link between a pair of 
EAGLE-DMO structures matched a.t z = 0 carries over to 
their main progenitors at all higher a. Then, we construct 
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Figure 10. Standard deviation (top panel) and mean (bottom 
panel) of the Gaussian functions used to fit the dependence of the 
stellar mass PDF on V^eiax different redshifts. The symbols rep¬ 
resent the measurements of the widths and the centres and the lines 
show the fits. Neither the scatter nor the mean of Mstar and Frelax 
evolves significantly. The orange lines show the results for galaxies 
at 2 = 0 that have reached K-eiax at 2 = 0— 1.5 (solid), 2 = 1.5 —3.5 
(dotted), and 2 = 3.5 — 6 (dashed). 


P{logiQ Mstar I logio each redshift, which we fit by Gaus¬ 

sian functions with mean ji and standard deviation a. In Fig. 
nni we show the results. We can see that neither the mean 
nor the scatter in the relation show any strong signs of red- 
shift dependence. Nevertheless, to estimate the impact on the 
clustering, we generated a new set of Veiax galaxies at z = 0 
employing the scatter and mean derived at different redshifts. 
We find that the differences in the 2PCF are always below 
1 %. 

As a further test, we split the z = 0 catalogue into 3 bins 
according to the redshift at which Veiax was reached: [0— 1.5], 
[1.5 — 3.5], and [3.5 — 6[. We overplot the mean and variance 
of these subsamples in Fig. [10] as orange lines, from which we 
see no obvious dependence on redshift. 

Therefore, we conclude that subhaloes of a given Keiax 
statistically host galaxies of the same Mstar at z = 0, indepen¬ 
dently of the time at which their Wire reached Weiax. 


5.2 Assumption II: Baryonic physics does not affect 
the SHAM property of subhaloes 

It is well know n that baryons modify the properties of 
their DM hosts (jNavarro^^^b 199^ Gnedin fc Zhad l2002l : 
[Read fc Gilmor j l2005l : Oman et al.l 2015ll . Notable examples 
are an increase in the central density of DM haloes due to adia¬ 


batic contraction, or the possible reduction due to feedback or 
episodic star formation events. However, SHAM assumes that 
the relevant property is that of the DM host in the absence of 
those baryonic effects. 

We estimate the impact of this assumption by comparing 
the 2PCFs of central galaxies in our cross-matched catalogue, 
which we then rank order and select using either Wnax from 
EAGLE or Knax from their DMO counterpart. We focus on 
central galaxies since Vhiax behaves well for those objects and 
should be directly relevant for Vjnfaii satellites. In addition, the 
cross-matched catalogue is highly complete, with less than 8% 
of central galaxies being excluded (see Table]!]), thus we expect 
our results to be representative of the full population. 

In general, we find that the values of Vmax for EAGLE 
galaxies are ~ 5 % lower than for DMO galaxies, with a scatter 
of 0.08 dex. However, since the scatter is 27 % of that of 
Mstar at a fixed Wnax, we expect this difference to have only 
a minor effect on the clustering. This is indeed what we find. 
The orange dotted line in Fig. [11] shows the relative difference 
of the 2PCFs. The curve is compatible with zero. Note that the 
noise on scales below 0.5 Mpe is caused by the small number 
of objects at those separations owing to the absence of satellite 
galaxies in this analysis. 

Therefore, we conclude that baryonic effects introduce 
only small perturbations in W rank ordered catalogues and 
will thus only have a minor effect on SHAM predictions. In 
any case, the noisiness of the curves do not enable us to com¬ 
pletely rule out small changes in the galaxy clustering due to 
the presence of baryons. 


5.3 Assumption III: Baryonic physics does not 
affect the position of subhaloes 

Another potential consequence of the presence of baryons is 
the modification of the positions of the subhaloes, caused by 
the slightly different dynamics induced by th e different struc¬ 
ture of the host halo. Ivan Daalen et al ] (l2014h found this effect 
to be important on scales below 1 Mpe (but negligible on larger 
scales). 

We quantify this effect by comparing the 2PCF of EAGLE 
galaxies in two cases; i) using their actual positions, and ii) 
using the position of their DMO counterparts. We show the 
relative difference between these two cases as a black solid line 
in Fig. [11] There are no deviations from zero on large scales 
and the clustering is underestimated by around 5 % on small 
scales. Therefore, the assumption that the presence of baryons 
does not modify the orbits of the subhaloes is justified for the 
range of scales explored here. 


5.4 Assumption IV: For a given Keiax) Mstar does 
not depend on environment 

We now address the assumption that deviations from the 
mean Mstar at fixed Vj-eiax are independent of the environment. 
Specifically, in this subsection we will investigate whether 
Mstar at fixed Veiax is indeed uncorrelated with the host halo 
mass. This is a key assumption in SHAM, because it enables 
the modelling of galaxy clustering with a single subhalo prop¬ 
erty. Naturally, the properties of galaxies are complex func¬ 
tions of their merger and assembly histories, but as long as 
these details are not correlated with large scales, they can be 
treated as stochastic fluctuations within SHAM. 
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Figure 11. The impact on the 2PCF of different assumptions made by SHAM. Different lines compare the 2PCF of EAGLE with those 
of catalogues that aim to isolate different physical effects not included in SHAM in order to quantify their importance for modelling galaxy 
clustering. Black solid lines show the impact of baryonic effects on subhalo positions. Orange dotted lines show the impact of baryonic effects 
on V^circ- Red dashed lines assess the importance of star formation in satellites after accretion. Blue dot-dashed lines show the impact of the 
stripping of stars inside massive haloes. The error bars display the jackknife statistical errors. See the main text for more details. 


We start by displaying in Fig. [T^] the median growth his¬ 
tories of central and satellite EAGLE galaxies within a narrow 
14eiax bin from 97 to 103 km s“^. We show the evolution of 
Wire, A7 dm, Afgas, and Mstar for centrals (left panel) and satel¬ 
lites (right panel). Different line styles indicate the results for 
galaxies inside three disjoint host halo mass bins (note that 
the range of halo masses is different for centrals and satellites). 
In the case of satellites, the grey bands mark the time after 
these objects were accreted and brown bands mark the period 
after the maximum value of Mstar( 2 ) has been reached. 

Interestingly, for every parameter there is a clear distinc¬ 
tion between subhalos hosted by haloes of different masses. 
Central subhalos in the higher host halo mass bin formed 
more recently, host more massive galaxies, and have larger 
gas reservoirs than central subhalos hosted by less massive 
host haloes. Centrals hosted by haloes in the most massive 
bin host a galaxie with a median Mstar 33 % higher than the 
median value for all the subhalos. On the other hand, centrals 
hosted by the least massive haloes have a median Mstar 18 % 
smaller. Therefore, the difference in Mstar is 0.22 dex and it 


corresponds to 16 % of the scatter in Mstar at a fixed Weiax 
(c.f. Fig. [21), which suggests that a non-negligible fraction of 
the scatter can be explained by host halo variations. 

The evolution of satellites is also different in distinct host 
halo mass bins. Subhalos that reside in more massive haloes 
reduce their Mdm and Wire values more significantly, suffer 
from stronger stripping of gas, and stop forming stars ear¬ 
lier than galaxies in less massive haloes. Furthermore, these 
processes appear to start prior to infall in all cases (this also 
serves as an example of the limitation o f Wnfaii), but the ear¬ 
lier the higher the halo mass (see also iBehroozi et al.l l2014l : 
iBahe fc McCarth^ l2015l L Nevertheless, and contrary to the 
central galaxies, the final Mstar is nearly independent of the 
host halo mass. It is also important to mention that the me¬ 
dian Mstar for satellites is 21 % higher than for centrals, which 
corresponds to 0.08 dex. Thus satellite galaxies have statisti¬ 
cally a greater Mstar than central galaxies. 

In general, the evolution of satellites is more compli¬ 
cated than that of centrals due to processes like strangula¬ 
tion, harassment, ram-pressure stripping, and tidal stripping 


© 0000 RAS, MNRAS 000, 000-000 























16 


J. Chaves-Montero et al. 


Centrals 


Satellites 




10 '® 


io'° 


O 


w 

(n 

CO 


IS 


10 ® 


10 ® 


1+Z 


1+Z 


Figure 12. Evolution of the median of several subhalo properties along the merger history for centrals (left panel) and satellites (right 
panel) with K-elax between 97 and 103 kms~^. The coloured lines show the evolution of the Vcirc? -^gas, and Mstar, as indicated 

by the legend. For each component, different line styles indicate different ranges of host halo mass. Black lines are surrounded with a grey 
coloured area after tinfall and brown lines with a brown one after £j\^max. The centrals acquire at 2 = 0 and the ones that reside in 

more massive haloes end up with higher stellar masses. For satellites the behaviour of Mstar is more complex. After infall, the satellites which 
contain gas continue forming stars until their gas is lost, but they can lose stellar mass due to tidal stripping. The subhaloes in the right 
panel which reside in haloes of 10^^ ® — 10^^'®, — 10^^'®, 10^^ ® — 10^^ ® Mq end up with respectively 99, 94, 91 % of their The 

stripping of DM, gas, and stars is thus more efficient for satellites in more massive host haloes (see Table 0. 


('e.g. IWetzel fc Whitell2oiol : IWatson et al.]l2012l 'l. These effects 
alter the growth of satellites in a non trivial way, which is 
not accounted for in SHAM. On the other hand, these pro¬ 
cesses are still not fully understood in detail, and it is not clear 
how realistically current hydrodynamical simulations like EA¬ 
GLE capture them. For instance, a precise modelling of ram 
pressure necessarily requires a precise modelling of the intra¬ 
cluster and interstellar medium. Additionally, a precise mod¬ 
elling of tidal stripping requires precise morphologies of the 
infalling galaxies. Hence, we choose to bracket their impact 
on SHAM clustering predictions by considering two extreme 
situations. 

We first consider a situation where satellite galaxies do 
not form or lose any stars after infall, i.e. the value of Mstar 
is fixed at infall. The last column in Table [ 6 ] compares Mstar 
at infall with Mstar at z = 0 for galaxies hosted by haloes of 
different masses. The corresponding relative difference in the 
2PCF is displayed by a red line in Fig. 1111 In this case the 
satellites are less massive, which causes SHAM to result in a 
10 — 20 % (depending on the range of Mstar considered) lower 
clustering signal on large scales. On small scales, the deficiency 
is larger, reaching more than 50 %. 

The second situation we consider is one where there is 
no tidal stripping of stars in satellite galaxies, i.e. performing 
SHAM using the maximum value of Mstar a galaxy has ever 
attained along its history, In Table [ 6 ] we compare the 

values of with Mstar at z = 0 for different bins in stellar 

and host halo mass. On average, we find that the Mstar reduc¬ 
tion begins after satellites have lost about 2/3 of their Mdm- 


We also find that this effect is stronger for low mass galax¬ 
ies in higher-mass haloes, which is indeed expected due to the 
stronger tides. The reduction can be up to 10 % in haloes with 
M 200 > 1 O'® ®M 0 . On the other hand, this effect is essentially 
zero in haloes with M 200 < 10 '^ ®Mq. 

To quantify how the stripping of stars affects the SHAM 
clustering predictions, we calculate the 2PCF after selecting 
galaxies according to Mg™“* and compare it to our fiducial 
EAGLE catalogue. The result is shown by the blue dot-dashed 
lines in Fig. 1111 In this case, the clustering is enhanced by 
about 10 % on scales greater than 1 Mpc and by up to 35 % on 
scales below 1 Mpc. This can be understood from the fact that 
the satellites are more massive, causing the satellite fraction 
and mean host halo mass increase, which affects the 2PCF 
particularly on small scales. 

The two effects considered here, stellar stripping and re¬ 
duced gas supply in satellites, affect the SHAM galaxy cluster¬ 
ing to a similar magnitude but with opposite sign. In particu¬ 
lar, for all Mstar their impact is larger than the differences be¬ 
tween SHAM and EAGLE predictions. Thus, the final galaxy 
clustering is sensitive to how these processes balance each 
other, which in turn depends sensitively on baryonic processes 
not yet fully understood quantitatively. On the one hand, this 
implies an intrinsic limitation of current SHAM modelling that 
is reached when better than 20 % accuracy is required. On 
the other hand, this suggests that galaxy clustering on small 
scales is a powerful test for the physics implemented in hydro- 
dynamical simulations. For instance, if SHAM results were to 
be taken as the reality and confirmed by observations, then 
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Table 6. Effect of the stripping of DM and stars from satellites, 
and of star formation after infall. Each value corresponds to the 
median of the distribution and its uncertainty computed as cr = 
1.4826 yiKD/y/n, where MAD is the median absolute deviation 
and n the number of elements. 




Mdm 




M2OOIM0J 

MTinax 

MsTa“ 

ii^infall 

•^''•^star 



-^star - 

= io».^^ 

- 1O9.27M0 




1011.6 

-1012.6 

0.428 

± 

0.011 

1.000 ± 0.000 

1.714 

± 

0.030 

1012.6 

- fOis.e 

0.314 

± 

0.008 

0.954 ± 0.002 

1.828 

± 

0.035 

1013.6 

- 1014.6 

0.274 

± 

0.008 

0.904 ± 0.004 

1.446 

± 

0.024 



-^star - 

= 109.27 

- 1O9.77M0 




1011.6 

- 1012.6 

0.458 

± 

0.015 

1.000 ± 0.000 

1.526 

± 

0.028 

1012.6 

- 1013.6 

0.329 

± 

0.011 

0.987 ± 0.001 

1.752 

± 

0.037 

1013.6 

- 1014.6 

0.278 

± 

0.011 

0.935 ± 0.004 

1.550 

± 

0.034 



Mstax — 

= 109.77 . 

- 1O1°.27M0 




1011.6 

-1012.6 

0.489 

± 

0.023 

1.000 ± 0.000 

1.360 

± 

0.027 

1012.6 

-1013.6 

0.352 

± 

0.014 

0.998 ± 0.000 

1.532 

± 

0.033 

1013.6 

- 1014.6 

0.263 

± 

0.012 

0.945 ± 0.004 

1.433 

± 

0.030 



-^star 

= 

1019.27 

- 1O1O.77M0 




1011.6 

- IOI2.6 

0.670 

± 

0.049 

1.000 ± 0.000 

1.187 

± 

0.032 

1012.6 

- IOI3.6 

0.386 

± 

0.020 

0.993 ± 0.001 

1.197 

± 

0.018 

1013.6 

- 1014.6 

0.238 

± 

0.014 

0.937 ± 0.005 

1.246 

± 

0.025 


EAGLE would implement too weak ram-pressure stripping of 
massive satellite galaxies and excessive stellar stripping of low- 
mass galaxies in haloes with M 200 > 


6 CONCLUSIONS 

We have used the Ref-L100N1504 EAGLE cosmological hydro- 
dynamical simulation to perform a detailed analysis of subhalo 
abundance matching for galaxies with stellar mass ranging 
from to M©. We used a catalogue of paired 

EAGLE galaxies and subhaloes in a corresponding DM-only 
simulation to search for an optimal implementation of SHAM, 
to test its performance in terms of halo occupation numbers, 
radial number density proHles, galaxy clustering, and assem¬ 
bly bias, and to investigate the validity of some of the key 
assumptions underlying SHAM. 

Our main Endings can be summarized as follows: 

• We argue that all current SHAM implementations use 
DM properties that are afiected by undesired physical or nu¬ 
merical artefacts. Thus, we propose a new measure: Keiax, 
which is defined as the maximum circular velocity that a sub¬ 
halo has reached while satisfying a relaxation criterion. We 
also studied SHAM using three other subhalo properties: Umax, 
the maximum circular velocity at z = 0; Vinfaii, the maxi¬ 
mum circular velocity at the last time a subhalo was a central; 
and Vpeak, the maximum circular velocity that a subhalo has 
reached. In Fig. 0] we show that out of the four SHAM flavours 
we tested, Keiax exhibits the strongest correlation with Mstar, 
independently of the subhalo history. 

• Uinfaii, Upeak, aud Ueiax reproduce the EAGLE predictions 
reasonably well (with Keiax performing slightly better than 

Unfall and Vpeak): 


- Fig. [S] shows that the distributions of host halo masses 
between EAGLE and SHAM flavours match closely. In par¬ 
ticular, the total satellite galaxy fraction agrees to within 
5 %. 

- Fig. El shows that galaxy clustering strength agrees to 
within 10 % on scales greater than 1 Mpc and within 30 % 
on smaller scales. We highlight that this relation holds over 
four orders of magnitude in amplitude and three in length 
scale. 

- Fig. [8] shows that in redshift space the agreement im¬ 
proves to the point that there is no statistically significant 
discrepancy. 

- Assembly bias is present both in EAGLE and in its 
SHAM catalogues. Fig.[9]shows that assembly bias increases 
the clustering by about 20 %. 

Although small, the differences between EAGLE and SHAM 
are systematic and significant. We attribute these to SHAM 
slightly overpredicting, compared to EAGLE galaxies, the 
fraction of low-mass satellites in massive haloes. 

• Fig. 1121 shows that there is a relation between Mstar and 
halo mass at fixed Keiax. Gentrals hosted by more massive 
haloes typically have higher Mstar, formed more recently, and 
contain more gas than those hosted by smaller haloes. Satel¬ 
lites that reside in more massive haloes typically reduce their 
Mdm and Wire values more significantly, suffer from stronger 
stripping of gas, and stop forming stars before accretion and 
earlier than those in less massive haloes. The Mstar of satellite 
galaxies at 2 = 0 is independent of the host halo mass and it 
is rvj 20 % greater than the Mstar of central galaxies at fixed 

Vrelax. 

• Interactions between satellites and their host haloes are 
very important for the amplitude of the correlation function, 
especially on small scales. We show in Fig. [11] that the differ¬ 
ence between two extreme cases: where no stars are formed 
after accretion and where galaxies suffer no stripping of stars, 
result in differences in the amplitude of the two-point correla¬ 
tion function of ±20 % on large scales and almost a factor of 
2 on small scales. 

We note that, although the box size of EAGLE (100 Mpc) 
is among the largest for simulations of its type, it is not large 
enough to ensure converged clustering properties. The lack of 
long wavemodes produces a few-percent excess of halos with 
M < 10^“^ Mq and a larger deficiency of more massive halos. 
We expect this to reduce the satellite fraction, which may 
affect the shape and amplitude of overall correlation function, 
and might thus make our assessment of SHAM slightly too 
optimistic. 

Overall, our results confirm the usefulness of SHAM for 
interpreting and modelling galaxy clustering. However, they 
also highlight the limits of current SHAM implementations 
when an accuracy better than 20 % is required. Beyond this 
point, details of galaxy formation physics become important. 
For instance, SHAM assumes that the relation between Keiax 
and Mstar is independent of the host halo mass. However, the 
validity of this assumption depends on how efficiently the gas 
content of satellite galaxies is depleted after accretion, on the 
importance of the stripping of stars in different environments, 
and on the relation between Mdm and Mstar for centrals. EA¬ 
GLE suggests that these effects depend on the host halo mass 
(and thus possibly on cosmological parameters), which would 
break the family of one-parameter SHAM models. 

Fortunately, it seems possible that these physical pro- 
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cesses can be modelled, and marginalised over, within SHAM. 
An interesting line of development would be the extension of 
SHAM to a two-parameter model, for instance a function of 
Keiax and Mhaio- This would not only reduce the systematic 
biases in the correlation function, but would also increase the 
predictive power of SHAM for centrals. We plan to explore 
this in the future. 

Naturally, as hydrodynamical simulations improve their 
realism, it should be possible to better model the evolu¬ 
tion of galaxies hosted by massive clusters, which will lead 
to more accurate SHAM implementations and a more accu¬ 
rate assessment of its performance. Ultimately, these develop¬ 
ments will enable quick and precise predictions for the clus¬ 
tering of galaxies in the highly non-linear regime. In principle, 
this could be extended as a function of cosmology employ¬ 
ing, e.g., cosmology-sca ling methods (I Angulo fc White! I 2 OIOI : 
lAngulo Sz Hilberl]|2015l l. This opens up many interesting pos¬ 
sibilities, such as the direct use of SHAM to optimally exploit 
the overwhelmingly rich and accurate clustering measurements 
that are expected to arrive over the next decade. 
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APPENDIX A: RESOLUTION 

In this section we will present two tests that suggest that our 
results are not affected by the finite mass and force resolution 
of the EAGLE and DMO simulations. Specifically, we will ex¬ 
plore the number of DM particles of the SHAM galaxies and 
compare simulations with different resolutions. 

In Fig. [13] we show the PDF of the number of DM parti¬ 
cles associated with central (top panel) and satellite (bottom 
panel) SHAM galaxies. Coloured lines show the results for dif¬ 
ferent Mstar bins using Keiax- The detection threshold of our 
SUBFIND catalogues (20 particles) is marked by a vertical 
dashed line. The top panel shows that nearly all the central 
subhaloes are resolved more than 1000 DM particles. Satel¬ 
lites, on the other hand, are resolved with fewer particles be¬ 
cause some of them will be lost to tidal stripping. However, 
since the value of Ueiax will be acquired before the stripping 
begins, we do not expect this to affect our results. The only 



Figure 13. Number of DM particles in subhaloes of a given Mstar- 
The coloured lines represent the mean PDFs of 100 realizations 
using Veiax for different stellar mass bins and the errors are the 
standard deviation of the 100 realizations. The top (bottom) panel 
shows the PDFs of centrals (satellites). The black dashed line in¬ 
dicates the detection threshold of our SUBFIND catalogues. The 
centrals are always resolved with more than 1000 particles. How¬ 
ever, the satellites have a tail in their distribution which reaches 
the detection threshold. 


effect that might be important is that a subhalo can fall below 
the detection threshold while its counterpart galaxy is still re¬ 
solved. We see that this might be the case for a very small 
fraction subhaloes in the lowest Mgtar bin. We quantify these 
effects next. 

In Fig. [TT]we show the number density of satellites (top 
panel) and the satellite fraction (bottom panel) for three dif¬ 
ferent EAGLE simulations and their DMO counterparts. The 
black lines show the results for the same simulation used 
in this paper (Ref--L100N1504), the blue lines for a simula¬ 
tion with 25 Mpc on a side and the same resolution as Ref- 
L100N1504 (Ref-L025N376), and the red lines for a simulation 
with 25 Mpc on a side and eight times higher mass resolution 
than Ref-L100N1504 (Ref-L025N752). To estimate the cosmic 
variance, we divide Ref-L100N1504 into 64 boxes of 25 Mpc on 
a side; the grey shaded areas enclose the 68 % of these boxes. 
The regions enclosed by vertical dotted lines in the bottom 
panels indicate the bins employed in @ 

The left two panels show that galaxies according to Mstar 
or V)nax produce almost identical satellite fractions in both 
(Ref-L025N752) and (Ref-L025N386), despite the former hav¬ 
ing 8 times better mass resolution. The satellite fraction co¬ 
incides with our main EAGLE run for high number densities, 
but under-predicts the satellite fraction at low number densi- 
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Figure 14. Number density of satellites (top panels) and satellite fraction (bottom panels) vs. total number density. In the left, centre, and 
right panels subhaloes are ordered according to ^max^ respectively. Coloured lines show the results for different simulations. 

The grey shaded areas enclose the 68 % of the results after dividing the simulation with the largest volume into 64 smaller boxes of 25 Mpc 
on a side. The regions enclosed by dotted lines indicate the bins employed in El 


ties. This, however, is plausibly explained by cosmic variance 
and the lack of long wave modes due to the smaller volume 
(64 times). The rightmost panel shows the DM0 versions, for 
which the agreement between different resolutions is even bet¬ 
ter. Thus, this suggests that the results presented in this paper 
are not affected by the numerical resolution of our simulations. 


APPENDIX B: CORRELATION FUNCTION 
CALCULATION 

The two point correlation function (2PCF) counts the number 
of pairs at different distances in relation to the number of pairs 
that one would have expected from a random distribution (see, 
e.g., IPavis et al ] ll98,ql:IPeebQl200lh : 

dP = n"[l-he(ri2)]dVidU2, (3) 

where n is the mean density and ^(ri 2 ) the correlation func¬ 
tion. This equation describes the excess probability, compared 
with a random sample, of finding a point in an element of vol¬ 
ume dV 2 at a distance ri 2 from another point in dVi. The 
2PCF is also the Fourier transform of the power spectrum 
P(k): 


^ (^ / (4) 

and the power spectrum is defined as: 

(5(k)5(k')) = (27r)35D(k - k')P(k), (5) 

where (5(k) is the Fourier transform of the density contrast and 
(5D(k) is the Dirac delta function. We can use this property to 
quickly compute the 2PCF using the fast Fourier transform 
(FFT). To calculate the 2PCF, we follow the following steps: 


• We divide the simulation cube into 1024® boxes of 97.6 
kpc on a side. We determine in each box the density contrast 
using a cloud-in-cell (CIC) scheme. The density contrast is 
defined as: 


(5(x) 


N -{N) 
{N) 


where N is the number of subhaloes inside one box and 
is the total number of subhaloes in the simulation cube. 
• The Fourier transform (FT) of the density field is: 


( 6 ) 

(N) 


(5(k) = J dx^e *'''^(5(x), 


(7) 


we compute this FT using version 3.3.3 of the Fastest Fourier 
Transform in the West (FFTW3; http://www.fftw.org/), a 
compilation of C routines for computing the discrete Fourier 
transform. 

• We calculate P(k) using equation [5] and then we subtract 
the Poisson noise. The Poisson noise arises from sampling a 
continuous distribution with a discrete number of objects. It 
scales as 1/n, where n is the number density of objects. 

• The next step is to go back to real space by computing 
the FT of P(k), yielding the 2PCF. 

• Finally, we spherically average the correlation function 
obtaining the 3D 2PCF ^(|r|). 


By dividing the simulation cube into different number of 
cells, we verified that using 1024® boxes represents the clus¬ 
tering beyond 0.3 Mpc faithfully. 
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